{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "ddc7185d",
   "metadata": {},
   "source": [
    "# Resume-to-Job Gap Analysis Tool"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fe7462c2",
   "metadata": {},
   "source": [
    "### **Project Summary**\n",
    "This project demonstrates the use of a Large Language Model (LLM) to perform a sophisticated analysis task with real-world business value. The tool automates the tedious process of manually comparing a candidate's resume against a job description. By providing a job description URL and a candidate's resume text, this notebook generates a detailed cover letter and \"gap analysis\" report. This report highlights which skills are matched, which are missing, and provides an overall suitability score, enabling recruiters to screen candidates more efficiently and helping applicants tailor their resumes effectively.\n",
    "\n",
    "### **How to Use**\n",
    "1.  **Set up your Environment**: Make sure you have a `.env` file in the root directory with your `OPENAI_API_KEY`.\n",
    "2.  **Input the Job URL**: In **Section 2**, paste the URL of a web-based job description into the `job_description_url` variable.\n",
    "3.  **Input the Resume**: In **Section 2**, paste the candidate's full resume text into the `resume_text` variable.\n",
    "4.  **Run the Notebook**: Execute the cells from top to bottom. The final cell in **Section 6** will display the formatted analysis report.\n",
    "\n",
    "### **A Note on Ethical Web Scraping**\n",
    "This tool uses the `requests` library to fetch website content. To ensure compliance and responsible use:\n",
    "* We send a standard `User-Agent` header to identify our script as a web browser, which is a common practice for preventing being blocked.\n",
    "* **Always be mindful of the website's terms of service.** Automated scraping may be disallowed on some sites. This tool is intended for educational purposes and should be used on publicly accessible job postings where such activity is permitted."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1a01b5d2",
   "metadata": {},
   "source": [
    "## 1. Setup:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "caca8d9a",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Imports\n",
    "import os\n",
    "import requests\n",
    "from dotenv import load_dotenv\n",
    "from bs4 import BeautifulSoup\n",
    "from IPython.display import Markdown, display\n",
    "from openai import OpenAI"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e2db03e8",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Load Environment Variables\n",
    "load_dotenv(override=True)\n",
    "api_key = os.getenv('OPENAI_API_KEY')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7c702fcc",
   "metadata": {},
   "source": [
    "#### Test OpenAI API Key"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5347ee38",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Validate API key\n",
    "if not api_key:\n",
    "    print(\"ERROR: No API key found - please add OPENAI_API_KEY to your .env file\")\n",
    "elif not api_key.startswith(\"sk-proj-\"):\n",
    "    print(\"WARNING: API key format may be incorrect\")\n",
    "elif api_key.strip() != api_key:\n",
    "    print(\"ERROR: API key has whitespace - please remove extra spaces/tabs\")\n",
    "else:\n",
    "    print(\"SUCCESS: API key loaded successfully\")\n",
    "\n",
    "# Initialize OpenAI client\n",
    "openai = OpenAI()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "dce21512",
   "metadata": {},
   "source": [
    "## 2. Data Input"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b5d90d56",
   "metadata": {},
   "outputs": [],
   "source": [
    "# The URL for the Y Combinator job posting you want to analyze. (ycombinator.com/companies/y-combinator/jobs//jobs)\n",
    "job_url = \"https://www.ycombinator.com/companies/y-combinator/jobs/rq3DaTs-product-engineer\"\n",
    "\n",
    "# Replace this example resume with the actual candidate's resume text.\n",
    "resume_text = \"\"\"\n",
    "John Doe\n",
    "123 Main Street, Anytown, USA | (123) 456-7890 | john.doe@email.com\n",
    "\n",
    "Summary\n",
    "Software Engineer with 5 years of experience in web applications. \n",
    "Proficient in Python and JavaScript with a strong background in AWS.\n",
    "\n",
    "Experience\n",
    "Senior Software Engineer | Tech Solutions Inc. | 2021 - Present\n",
    "- Led development of analytics dashboard using React and Python\n",
    "- Architected microservices backend on AWS\n",
    "- Mentored junior engineers\n",
    "\n",
    "Software Engineer | Innovate Corp. | 2018 - 2021\n",
    "- Developed e-commerce platform using Python and Django\n",
    "- Wrote comprehensive unit and integration tests\n",
    "\n",
    "Skills\n",
    "Python, JavaScript, React, Flask, Django, AWS, Docker, Git\n",
    "\"\"\"\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a3d5e484",
   "metadata": {},
   "source": [
    "## 3. Prompt Engineering"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "6b2b3d1b",
   "metadata": {},
   "outputs": [],
   "source": [
    "SYSTEM_PROMPT = \"\"\"\n",
    "You are a strategic career advisor. Your task is to synthesize a candidate's resume and a job description into a compelling, two-part analysis. Your goal is to create a narrative connecting the candidate's specific accomplishments to the company's needs.\n",
    "\n",
    "**Formatting:** Use markdown with bolding for emphasis. Do not use placeholders like '[Job Title]'; infer the details from the text.\n",
    "\n",
    "---\n",
    "\n",
    "# Part 1: Candidate Suitability Analysis\n",
    "\n",
    "## Executive Summary\n",
    "Provide a 2-3 sentence summary of the candidate's alignment with the role, stating your professional opinion on their potential.\n",
    "\n",
    "## Key Strengths & Evidence\n",
    "List the top 3 strengths the candidate brings. For each strength, **quote or paraphrase evidence directly from the resume's 'Experience' section**.\n",
    "* **Strength:** [Example: Scalable Backend Development] - **Evidence:** [Example: \"Architected microservices backend on AWS,\" demonstrating hands-on experience.]\n",
    "\n",
    "## Areas for Growth & Discussion\n",
    "Identify key requirements from the job description not explicitly covered in the resume. Frame these as **strategic points to address in an interview**.\n",
    "* **Topic:** [Example: TypeScript Proficiency] - **Suggested Question:** \"The role heavily uses TypeScript. Could you discuss your experience level with it and your approach to learning new languages?\"\n",
    "\n",
    "## Holistic Suitability Score\n",
    "Provide a score (e.g., 85/100) and justify it in one sentence.\n",
    "\n",
    "---\n",
    "\n",
    "# Part 2: Dynamic Cover Letter Draft\n",
    "Generate a compelling and authentic cover letter from the candidate's perspective.\n",
    "\"\"\""
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5146a406",
   "metadata": {},
   "source": [
    "## 4. Webscraper"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4d23965d",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Scraper Function\n",
    "def scrape_ycombinator_job(url: str) -> str:\n",
    "    \"\"\"\n",
    "    Scrapes a single job posting from a ycombinator.com URL.\n",
    "\n",
    "    Args:\n",
    "        url: The URL of the Y Combinator job posting.\n",
    "\n",
    "    Returns:\n",
    "        The cleaned text of the job description, or an error message.\n",
    "    \"\"\"\n",
    "    print(f\"INFO: Attempting to scrape YC job posting from: {url}\")\n",
    "    \n",
    "    headers = {\n",
    "        'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36'\n",
    "    }\n",
    "    \n",
    "    try:\n",
    "        # Fetch the page content\n",
    "        response = requests.get(url, headers=headers, timeout=10)\n",
    "        # Raise an error if the page is not found (e.g., 404)\n",
    "        response.raise_for_status()\n",
    "\n",
    "        # Parse the HTML with BeautifulSoup\n",
    "        soup = BeautifulSoup(response.content, 'html.parser')\n",
    "\n",
    "        # Extract the job title (specifically from the <h1> tag)\n",
    "        title_element = soup.select_one('h1')\n",
    "        title = title_element.get_text(strip=True) if title_element else \"Job Title Not Found\"\n",
    "\n",
    "        # Extract the main job description content (from the <div class=\"prose\">)\n",
    "        description_element = soup.select_one('.prose')\n",
    "        description = description_element.get_text(separator='\\n', strip=True) if description_element else \"\"\n",
    "        \n",
    "        # Combine them for the final text\n",
    "        full_text = f\"Job Title: {title}\\n\\n{description}\"\n",
    "        \n",
    "        print(\"SUCCESS: Scraping complete.\")\n",
    "        return full_text\n",
    "\n",
    "    except requests.exceptions.RequestException as e:\n",
    "        print(f\"ERROR: Scraping failed. Could not fetch URL. {e}\")\n",
    "        return \"[Scraping failed: Could not connect to the server]\"\n",
    "    except Exception as e:\n",
    "        print(f\"ERROR: An unexpected error occurred during scraping: {e}\")\n",
    "        return \"[Scraping failed: An unexpected error occurred]\"\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e159596d",
   "metadata": {},
   "source": [
    "## 5. Gap Analysis"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d0dc8f72",
   "metadata": {},
   "outputs": [],
   "source": [
    "def get_analysis(job_description: str, resume: str) -> str:\n",
    "    \"\"\"Sends the job description and resume to the AI and returns the analysis.\"\"\"\n",
    "    print(\"INFO: Sending data to the AI for analysis...\")\n",
    "    user_prompt = f\"\"\"Please generate the analysis based on the following documents.\n",
    "\n",
    "    **JOB DESCRIPTION:**\n",
    "    ---\n",
    "    {job_description}\n",
    "    ---\n",
    "\n",
    "    **CANDIDATE RESUME:**\n",
    "    ---\n",
    "    {resume}\n",
    "    ---\n",
    "    \"\"\"\n",
    "    messages = [\n",
    "        {\"role\": \"system\", \"content\": SYSTEM_PROMPT},\n",
    "        {\"role\": \"user\", \"content\": user_prompt}\n",
    "    ]\n",
    "    response = openai.chat.completions.create(\n",
    "        model=\"gpt-4o-mini\",\n",
    "        messages=messages\n",
    "    )\n",
    "    print(\"SUCCESS: Analysis complete.\")\n",
    "    return response.choices[0].message.content"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f1deb906",
   "metadata": {},
   "source": [
    "## 6. Execution"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d3e57129",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Scrape the job description text from the URL\n",
    "job_description_text = scrape_ycombinator_job(job_url)\n",
    "\n",
    "# Only proceed if scraping was successful\n",
    "if not job_description_text.startswith(\"[Scraping failed\"):\n",
    "    # Run the analysis with the scraped text\n",
    "    analysis_report = get_analysis(job_description_text, resume_text)\n",
    "    # Display the final report\n",
    "    display(Markdown(analysis_report))\n",
    "else:\n",
    "    # If scraping failed, display the error message\n",
    "    display(Markdown(f\"## {job_description_text}\"))"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "llms",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.13"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
